Simple Linear Regression

STA 210 - Spring 2022

Dr. Mine Çetinkaya-Rundel

Topic

  • Use simple linear regression to describe the relationship between a quantitative predictor and quantitative response variable.
  • Estimate the slope and intercept of the regression line using the least squares method.
  • Interpret the slope and intercept of the regression line.

Movie ratings data

The data set contains the “Tomatometer” score (critics) and audience score (audience) for 146 movies rated on Rotten Tomatoes.

Movie ratings data

We want to fit a line to describe the relationship between the critics score and audience score.

Terminology

The response, Y, is the variable describing the outcome of interest.

The predictor, X, is the variable we use to help understand the variability in the response.

Regression model

A regression model is a function that describes the relationship between the response, \(Y\), and the predictor, \(X\).

\[\begin{aligned} Y &= \color{black}{\textbf{Model}} + \text{Error} \\[8pt] &= \color{black}{\mathbf{f(X)}} + \epsilon \\[8pt] &= \color{black}{\boldsymbol{\mu_{Y|X}}} + \epsilon \end{aligned}\]

Regression model

\[\begin{aligned} Y &= \color{purple}{\textbf{Model}} + \text{Error} \\[8pt] &= \color{purple}{\mathbf{f(X)}} + \epsilon \\[8pt] &= \color{purple}{\boldsymbol{\mu_{Y|X}}} + \epsilon \end{aligned}\]

Regression model + residuals

\[\begin{aligned} Y &= \color{purple}{\textbf{Model}} + \color{blue}{\textbf{Error}} \\[5pt] &= \color{purple}{\mathbf{f(X)}} + \color{blue}{\boldsymbol{\epsilon}} \\[5pt] &= \color{purple}{\boldsymbol{\mu_{Y|X}}} + \color{blue}{\boldsymbol{\epsilon}} \\[5pt] \end{aligned}\]

Simple linear regression

When we have a quantitative response, \(Y\), and a single quantitative predictor, \(X\), we can use a Simple linear regression model to describe the relationship between \(Y\) and \(X\). \[\begin{aligned} Y &= \mathbf{\beta_0 + \beta_1 X} + \epsilon \end{aligned}\]

\[\boldsymbol{\beta}_1: \text{Slope} \hspace{20mm} \boldsymbol{\beta}_0: \text{Intercept}\]

Simple linear regression

\[\Large{\hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X}\]

How do we choose values for \(\hat{\beta}_1\) and \(\hat{\beta}_0\)?

Residuals

\[\text{residual} = \text{observed} - \text{predicted} = y - \hat{y}\]

Least squares line

  • The residual for the \(i^{th}\) observation is

\[e_i = \text{observed} - \text{predicted} = y_i - \hat{y}_i\]

  • The sum of squared residuals is

\[e^2_1 + e^2_2 + \dots + e^2_n\]

  • The least squares line is the one that minimizes the sum of squared residuals

Estimating the slope

\[\large{\hat{\beta}_1 = r \frac{s_Y}{s_X}}\]

\[ \begin{aligned} s_X &= 30.169 \\ s_Y &= 20.024 \\ r &= 0.781 \end{aligned} \]

\[ \begin{aligned} \hat{\beta}_1 &= 0.781 \times \frac{20.024}{30.169} \\ &= 0.518\end{aligned} \]

Estimating the intercept

\[\large{\hat{\beta}_0 = \bar{Y} - \hat{\beta}_1\bar{X}}\]

\[\begin{aligned} &\bar{x} = 60.850 \\ &\bar{y} = 63.877 \\ &\hat{\beta}_1 = 0.518 \end{aligned}\]

\[ \begin{aligned}\hat{\beta}_0 &= 63.877 - 0.518 \times 60.850 \\ &= 32.296 \end{aligned} \]

Interpreting slope & intercept \[\widehat{\text{audience}} = 32.296 + 0.518 \times \text{critics}\]

  • Slope: For every one point increase in the critics score, we expect the audience score to increase by 0.518 points, on average.
  • Intercept: If the critics score is 0 points, we expect the audience score to be 32.296 points.

Does it make sense to interpret the intercept?

✅ Interpret the intercept if - the predictor can feasibly take values equal to or near zero. - there are values near zero in the data.

🛑 Otherwise, don’t interpret the intercept!

Recap

  • Used simple linear regression to describe the relationship between a quantitative predictor and quantitative response variable.
  • Used the least squares method to estimate the slope and intercept.
  • We interpreted the slope and intercept.
    • Slope: For every one unit increase in \(x\), we expect y to change by \(\hat{\beta}_1\) units, on average.
    • Intercept: If \(x\) is 0, then we expect \(y\) to be \(\hat{\beta}_0\) units.